This vignette outlines the steps of applying STEAM on the 10X Visium DLPFC (Dorsolateral Prefrontal Cortex) data. Four different classification models are implemented in STEAM to evaluate clustering results, including Random Forest, Support Vector Machines, XGBoost, and Multinomial Logistic Regression.
Here, cell-type classification is performed by incorporating spatial neighborhood analysis and Random Forest model with cross validation.
Before using STEAM, we recommend preprocessing your spatial transcriptomics data using the Seurat framework, as it provides a robust set of tools for spatial data normalization, scaling, and visualization. The Satija Lab offers detailed vignettes to guide you through spatial data preprocessing:
To run STEAM, you need the following components:
library(STEAM)
## Loading required package: ggplot2
## Loading required package: RANN
## Loading required package: patchwork
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Package: STEAM
## Description: Evaluating consistency and reliability of clustering in spatial omics using STEAM
## Version: 1.2
## Release Date: 2025-02-01
## Authors: Samantha Reynoso, Courtney Schiebout, Revanth Krishna, Fan Zhang (University of Colorado School of Medicine - CU Medicine)
## Maintainer: Revanth Krishna <revanth.krishna@cuanschutz.edu>
library(Seurat)
## Loading required package: SeuratObject
## Loading required package: sp
##
## Attaching package: 'SeuratObject'
## The following objects are masked from 'package:base':
##
## intersect, t
DLPFC.RDS is sample 151669 from the DLPFC dataset from Visium by 10X
Genomics. This data is already preprocessed and includes a
SCTransform scaled data matrix by Seurat,
spatial coordinates, and pre-identified cluster labels
DLPFCs <- readRDS("~/Desktop/STEAM/DLPFC/DLPFC12.RDS")
DLPFC <- DLPFCs$`151669`
Parameters
Seurat.obj — a Seurat object with counts
and metadata.label.column — metadata column containing the
ground‑truthlabels (e.g., cortical layer, cell type).assay — which assay to read (e.g., "SCT",
"RNA").Outputs - STEAM.obj$count_exp,
STEAM.obj$labels, STEAM.obj$spatial (if
present in the Seurat object), and optional embeddings (PCA/UMAP) if
available.
You can input this data into STEAM in two ways:
STEAM.obj <- LoadSTEAM(
Seurat.obj = DLPFC,
label.column = "Labels",
assay = "SCT"
)
## Using GetTissueCoordinates() to extract spatial data.
Parameters
count_exp — expression matrix.spatial — coordinates; must align rownames with cell
IDs used elsewhere.labels — named vector of reference labels (ground
truth).pca, umap, clusters —
optional; used for diagnostics/plots.Common checks - Row/column names must be consistent across objects.
labels must cover all cells you plan to evaluate.Key parameters
-mode — "nested" performs outer
and inner cross‑validation (outer for unbiased evaluation;
inner for tuning). -n_outer_folds,
n_inner_folds — fold counts. -cv.cores —
parallel workers for outer folds. -allowParallel — whether
caret may parallelize inner CV (often FALSE to avoid nested
parallelism contention). -model: The machine learning model
to use for classification.
- `rf` = RandomForest
- `svm` = Support Vector Machines,
- `xgb` = XGBoost
- `multinom` = Multinomial Logistic Regression
metric — caret optimization metric (e.g.,
"Accuracy", "Kappa").maxnweights, n.size — package‑specific
compute/feature knobs.Outputs
STEAM.obj$nested$ncv — nested CV object
containing:outer_result[[i]]$preds — outer test
predictions per fold with predy (prediction),
testy (truth), and per‑class probabilities.outer_result[[i]]$fit$pred — inner CV
predictions used for tuning.NOTE: custom model parameters and grids are at the bottom of this document.
STEAM.obj <- RunSTEAM(
STEAM.obj, mode = "nested",
n_outer_folds = 5, n_inner_folds = 3,
cv.cores = 10,
allowParallel = FALSE,
n.size = 5,
model = "rf",
metric = "Kappa",
maxnweights = 15000
)
## Creating nested CV folds using method: stratified
## === STRATIFIED CV DIAGNOSTICS ===
## Method used: stratified
## Outer folds: 5
## Fold sizes: 729, 727, 727, 727, 726
## Fold size balance ratio: 1
## Overall class balance ratio: 1.01 (closer to 1.0 = better balance)
##
## === STRATIFICATION QUALITY METRICS ===
## Overall Quality Index: 0.981 (0-1 scale, higher = better, calibrated)
## Overall Chi-squared: 0.023 (lower = better, threshold=47.4)
## Overall CV: 0.003 (lower = better, threshold=0.3)
## Overall MAD: 0.001 (lower = better, threshold=0.294)
##
## Normalized penalty components:
## Chi-squared penalty: 0 (weight: 0.4)
## CV penalty: 0.009 (weight: 0.35)
## MAD penalty: 0.002 (weight: 0.25)
## Combined penalty: 0.004 (0=perfect, 1=poor)
##
## Per-class quality metrics:
## Layer_4: χ²=0.011, CV=0.005, MAD=0.001, p=1
## Layer_6: χ²=0.01, CV=0.004, MAD=0.001, p=1
## Layer_3: χ²=0.002, CV=0.001, MAD=0.001, p=1
## WM: χ²=0, CV=0.002, MAD=0, p=1
## Layer_5: χ²=0, CV=0.002, MAD=0, p=1
##
## EXCELLENT stratification quality
##
## Class distributions per fold:
## Fold 1: Layer_3:429, Layer_4:73, Layer_5:102, Layer_6:79, WM:46
## Fold 2: Layer_3:428, Layer_4:73, Layer_5:102, Layer_6:78, WM:46
## Fold 3: Layer_3:428, Layer_4:73, Layer_5:102, Layer_6:78, WM:46
## Fold 4: Layer_3:428, Layer_4:73, Layer_5:102, Layer_6:78, WM:46
## Fold 5: Layer_3:428, Layer_4:72, Layer_5:102, Layer_6:78, WM:46
## ================================
## Performing 5-fold outer CV, using 10 cores
## Duration: 3.969475 mins
## Finished NestedCV training (outer+inner with leakage-safe averaging)
## Collected NestedCV results
ViewMetrics() — tabular summaries overall / per
fold.plot_misclassified_cells() — visualize errors spatially
or by fold.feature_importance() — model‑specific importance (e.g.,
RF Gini/perm).feature_expression() — expression of a gene/feature in
different views.ViewMetrics(STEAM.obj, view = "overall")
# To view specific folds: ViewMetrics(STEAM.obj, fold = 1)
plot_misclassified_cells(STEAM.obj, fold = 1)
# plot all misclassified cells across all folds
# plot_misclassified_cells(STEAM.obj)
feature_importance(STEAM.obj)
## MBP TMSB10 MT-ND1 PCP4 TUBA1B MT-ATP6 MT-ND2 PLP1
## 241.85374 178.18029 52.00756 51.62953 44.24113 33.78553 22.33203 21.82176
## MT-CO2 CCK
## 21.38931 20.98617
feature_expression(STEAM.obj, "MBP", view = "pooled")
#feature_expression(STEAM.obj, "MBP", view = "facet")
The STEAMCorrection system leverages spatial neighborhood information to identify and correct misclassified cells from your cross-validation results. This approach is based on the biological principle that spatially adjacent cells often share similar tissue types or functional states.
STEAMCorrection works by:
1.Identifying misclassified cells from nested CV results 2.Finding spatial neighbors using k-nearest neighbors 3.Applying hierarchical voting where neighbors vote on the correct cell type 4.Making corrections iteratively with adaptive consensus thresholds 5.Tracking progress across multiple correction rounds
The algorithm uses a hierarchical voting system:
This automatically processes all cross-validation folds with sensible defaults:
k = 8 neighborsmin_consensus = 0.6 (60% agreement for primary
corrections)max_iterations = 10For optimal results, you may need to adjust parameters based on your data characteristics:
Parameter Guidelines:
# Check correction summary across all folds
correction_results <- STEAM.obj$spatial_anchor_analysis$correction_results
for(fold_name in names(correction_results)) {
fold_data <- correction_results[[fold_name]]
cat(sprintf("Fold %d: %.1f%% → %.1f%% accuracy (%d corrections)\n",
fold_data$fold_number,
fold_data$initial_accuracy * 100,
fold_data$final_accuracy * 100,
fold_data$total_corrections))
}
## Fold 1: 79.4% → 90.1% accuracy (492 corrections)
## Fold 2: 78.4% → 90.0% accuracy (515 corrections)
## Fold 3: 78.3% → 90.0% accuracy (436 corrections)
## Fold 4: 80.3% → 90.1% accuracy (498 corrections)
## Fold 5: 81.7% → 93.1% accuracy (425 corrections)
# Show only key iterations
IterativePlot(STEAM.obj, fold = 1, iterations = c(0, 1, 2, 3))
## Showing 4 selected iterations: 0, 1, 2, 3
## Warning in get_plot_component(plot, "guide-box"): Multiple components found;
## returning the first one. To return all, use `return_all = TRUE`.
# Show correction progress for a specific fold
#IterativePlot(STEAM.obj, fold = 1)
# Bar chart showing accuracy gains per fold
AccuracyPairedBarChart(STEAM.obj, show_corrections = FALSE)
# Analyze when spatial correction works best
NeighborhoodHomogeneityAnalysis(STEAM.obj)
## Auto-detected k=8 from stored parameters
# Which tissue types benefit most from correction?
ErrorAnalysisPlot(STEAM.obj)
High success indicators:
Correction challenges:
Common Issues:
No corrections applied:
getMisclassifiedCells(STEAM.obj)min_consensus thresholdmax_voting_levelsToo many corrections:
min_consensusk (fewer neighbors)Poor correction quality:
NeighborhoodHomogeneityAnalysis() resultsSlow performance:
max_iterationsk values1.Start with defaults then tune based on your tissue characteristics 2.Examine results visually using the plotting functions 3.Validate corrections make biological sense for your system 4.Consider tissue boundaries - corrections near boundaries may be less reliable 5.Document parameters used for reproducibility 6.Cross-validate correction performance on held-out data when possible
sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: aarch64-apple-darwin20
## Running under: macOS 15.6.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Denver
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] Seurat_5.3.0 SeuratObject_5.1.0 sp_2.2-0 STEAM_1.2
## [5] dplyr_1.1.4 patchwork_1.3.0 RANN_2.6.2 ggplot2_3.5.2
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 shape_1.4.6.1 rstudioapi_0.17.1
## [4] jsonlite_2.0.0 magrittr_2.0.3 spatstat.utils_3.1-3
## [7] farver_2.1.2 rmarkdown_2.29 vctrs_0.6.5
## [10] ROCR_1.0-11 matrixTests_0.2.3 spatstat.explore_3.4-2
## [13] htmltools_0.5.8.1 pROC_1.18.5 caret_7.0-1
## [16] sass_0.4.10 sctransform_0.4.2 parallelly_1.43.0
## [19] KernSmooth_2.23-26 bslib_0.9.0 htmlwidgets_1.6.4
## [22] ica_1.0-3 plyr_1.8.9 plotly_4.10.4
## [25] zoo_1.8-14 lubridate_1.9.4 cachem_1.1.0
## [28] igraph_2.1.4 mime_0.13 lifecycle_1.0.4
## [31] iterators_1.0.14 pkgconfig_2.0.3 Matrix_1.7-3
## [34] R6_2.6.1 fastmap_1.2.0 aricode_1.0.3
## [37] fitdistrplus_1.2-2 future_1.40.0 shiny_1.10.0
## [40] digest_0.6.37 colorspace_2.1-1 tensor_1.5
## [43] RSpectra_0.16-2 irlba_2.3.5.1 labeling_0.4.3
## [46] progressr_0.15.1 randomForest_4.7-1.2 spatstat.sparse_3.1-0
## [49] timechange_0.3.0 httr_1.4.7 polyclip_1.10-7
## [52] abind_1.4-8 compiler_4.4.1 proxy_0.4-27
## [55] doParallel_1.0.17 withr_3.0.2 viridis_0.6.5
## [58] fastDummies_1.7.5 MASS_7.3-65 lava_1.8.1
## [61] ModelMetrics_1.2.2.2 tools_4.4.1 lmtest_0.9-40
## [64] httpuv_1.6.16 future.apply_1.11.3 goftest_1.2-3
## [67] nnet_7.3-20 glue_1.8.0 nlme_3.1-168
## [70] promises_1.3.2 grid_4.4.1 Rtsne_0.17
## [73] cluster_2.1.8.1 reshape2_1.4.4 generics_0.1.3
## [76] recipes_1.3.0 gtable_0.3.6 spatstat.data_3.1-6
## [79] class_7.3-23 tidyr_1.3.1 data.table_1.17.0
## [82] spatstat.geom_3.3-6 RcppAnnoy_0.0.22 ggrepel_0.9.6
## [85] foreach_1.5.2 pillar_1.10.2 stringr_1.5.1
## [88] spam_2.11-1 RcppHNSW_0.6.0 later_1.4.2
## [91] splines_4.4.1 lattice_0.22-7 FNN_1.1.4.1
## [94] deldir_2.0-4 survival_3.8-3 tidyselect_1.2.1
## [97] miniUI_0.1.2 pbapply_1.7-2 knitr_1.50
## [100] gridExtra_2.3 scattermore_1.2 RhpcBLASctl_0.23-42
## [103] stats4_4.4.1 xfun_0.52 nestedcv_0.8.0
## [106] hardhat_1.4.1 timeDate_4041.110 matrixStats_1.5.0
## [109] stringi_1.8.7 lazyeval_0.2.2 yaml_2.3.10
## [112] evaluate_1.0.3 codetools_0.2-20 tibble_3.2.1
## [115] cli_3.6.5 RcppParallel_5.1.10 uwot_0.2.3
## [118] rpart_4.1.24 xtable_1.8-4 reticulate_1.42.0
## [121] jquerylib_0.1.4 dichromat_2.0-0.1 Rcpp_1.0.14
## [124] zigg_0.0.2 spatstat.random_3.3-3 globals_0.17.0
## [127] png_0.1-8 Rfast_2.1.5.1 spatstat.univar_3.1-2
## [130] parallel_4.4.1 gower_1.0.2 mclust_6.1.1
## [133] dotCall64_1.2 glmnet_4.1-10 listenv_0.9.1
## [136] viridisLite_0.4.2 ipred_0.9-15 scales_1.4.0
## [139] prodlim_2025.04.28 e1071_1.7-16 ggridges_0.5.6
## [142] purrr_1.0.4 rlang_1.1.6 cowplot_1.1.3